home *** CD-ROM | disk | FTP | other *** search
- To: jaffe@playfair.rutgers.edu (Saul)
- cc: bryan@umich.edu, bind-4.9@inet-gw-2.pa.dec.com
- From: David Bolen <db3l@ans.net>
- Subject: Re: Bind 4.9/named
- Date: Fri, 16 Apr 93 15:58:26 EDT
- ----------------------------------
-
- > From Saul Jaffe <jaffe@noc.rutgers.edu>
- >
- > > From: Bryan Beecher <bryan@umich.edu>
- > >
- > > Do you know about SA (Shuffle Address) records? (...)
- >
- > Actually, no I didn't know about them.
- >
- > > Maybe SA and CIP records are really the same thing . . .
- >
- > Maybe they are - can you give me a little more detail on how they work?
-
- Having just implemented yet another variant of this idea, perhaps I
- can help with a comparison. It so happens that just this past week
- I've been going through both the CIP and SA versions of bind before
- working on a POOL (yep - yet another name :-)) record type for our use
- here at ANS.
-
- Anyone can feel to correct anything amiss below - it's just my
- impression of these particular implementations. As it turns out,
- neither were really exactly suited to my purpose, which is why I built
- the POOL record. I've included some info on my POOL work as another
- point of comparison:
-
-
- SA Records:
-
- * Implemented as a 4.8.3 variant.
- * "SA" in config file. Stored within nameserver as T_A with a new
- db flag DB_F_SHUFFLE on record. Syntax in config file is the same
- as an A record.
- * Following the formation of a reply, if any DB_F_SHUFFLE records
- were inserted into the answer, the answer is "shuffled". Shuffling
- involves randomly picking a new starting point in the set of
- RRs in the answer, and then reordering the answer starting at
- that point. For example, if an answer had:
- 1.1.1.1, 2.2.2.2, 3.3.3.3, 4.4.4.4
- the shuffling would randomly pick a starting point (say
- 3.3.3.3) and then continue with the same order, such as:
- 3.3.3.3, 4.4.4.4, 1.1.1.1, 2.2.2.2
- There is a static limit (currently 16 - arbitrary compiled-in
- choice) to the number of records that can be shuffled. If more
- records (of any type) are present in the answer, then no
- shuffling is done at all.
- * Secondary nameservers are assumed not to understand SA records,
- which are zone transferred simply as T_A records. If you give
- the -s flag to named, it assumes that all secondary nameservers
- understand SA and it translates any T_A records with a DB_F_SHUFFLE
- flag into T_SA records during a zone transfer.
-
- CIP Records:
-
- * Implemented as a 4.8.3 variant.
- * "CIP" in config file. Stored within nameserver as T_CIP record.
- Syntax in config file is like an MX record. The numeric value
- before the hostname is a ranking for that member of the CIP pool
- used to control the chances that specific entry of the pool is
- returned in response to a query.
- * As part of the formation of a reply, a check is made to see if
- no answers were located, and the request was for T_A records.
- If no T_A records were found, a search is performed for T_CIP
- records, which are then translated into an actual address ala
- CNAME processing, which in turn is then returned to the user as
- a T_A answer. The selection of the appropriate T_CIP record is
- done randomly but is weighted according to the preference
- assigned to each T_CIP record per name. Only the single selected
- T_CIP record is returned.
- * Prior to searching for a reply to a request, any CIP records
- in the matching namebuf are rotated in a round-robin fashion.
- (This doesn't seem to have any affect on reply formation - maybe
- a leftover piece of code?)
- * Secondary nameservers are assumed to understand CIP records,
- which are zone transferred as T_CIP records.
-
-
- POOL Records:
-
- * Implemented as a 4.9 Beta (3/15) variant.
- * "POOL" in config file. Stored within nameserver as T_POOL record.
- Syntax in config file is the same as an A record.
- * Following the formation of a reply, if any T_POOL records were
- inserted into the answer, the answer will be "fixed". Fixing the
- answer involves changing the T_POOL record type to T_A, and then
- reordering the response. During reordering, the first T_POOL
- record in the answer is left alone. Any remaining T_POOL records
- are randomly ordered. Thus, an answer of:
- 1.1.1.1, 2.2.2.2, 3.3.3.3, 4.4.4.4
- might be reordered to become:
- 1.1.1.1, 4.4.4.4, 2.2.2.2, 3.3.3.3
- There is a static limit (currently 20 - arbitrary compiled-in
- constant) to the number of T_POOL records that can be reordered.
- If more than this many T_POOL records are in the answer, than
- only the first 20 (or whatever) T_POOL records are reordered.
- * Also after the formation of a reply, the T_POOL records within
- the database entry are rotated in a round-robin fashion. This
- means that the primary answer in a pool will always rotate
- through the pool round-robin, and the remainder of the pool will
- always be returned as part of the answer, but in random order.
- * By default, secondary nameservers are assumed not to understand
- T_POOL records, which are zone transferred as T_A records. A
- new boot file option (poolns) was added to specify a list of
- nameservers (ala the xfernets/bogusns options) that understand
- T_POOL records. Zone transfers from any of those nameservers will
- receive the pool records as T_POOL records.
-
-
- General Comments:
-
- * Both SA and CIP implementations chose 104 as the record type. I
- chose 200 for POOL just to be different, and to avoid a clash.
- * I'm not absolutely sure, but it looks like the shuffling code
- in the SA implementation won't work unless you have nothing but
- T_A/T_SA records for a node. In addition, at long as you have
- one T_SA record, all T_SA *and T_A* records are shuffled.
- * CIPs use of names rather than addresses avoids the problem of
- having to enter pool member addresses twice - once for the actual
- A record and once as part of the pool. This is nice. I want to
- move my POOL stuff in that direction, but was time constrained so
- doing the A format was quicker initially.
- * The ranking used by CIP is nice, but requires more thought as to
- how to deal with cases where you want a machine to have a higher
- probability of being the primary response, but at the same time
- you don't really want it returned twice in a row in general. You'd
- just like it selected more often over a longer time scale.
- * The fact that SA and POOL return the entire pool of addresses
- is nice for providing a rollover in the case of a problem at the
- primary address.
-
-
- In general, the reason I wrote the code for POOL records was to add
- some semantics over the other two implementations:
-
- vs. CIP:
-
- * I need to return multiple answers. The CPTs have a problem with
- simultaneous arrival of TCP open requests, so I need to have
- alternate addresses in the response for clients to fall over to.
-
- vs. SA:
-
- * I didn't want the set of returned addresses to always follow
- the same order. That causes all clients to rollover the same
- way, which can overload a machine just after one that might
- go down or refuse connections. I'm not sure having the primary
- answer rotate round-robin really buys me anything, but at least
- randomizing all other answers helps keep clients from piling
- up on a single host due to rollover.
-
- vs. both:
-
- * Wanted to deploy and test without updating all nameservers to
- understand the new record. I also wanted by nameserver control
- over who needed to know about the new record.
-
-
- My resulting code is a bit closer to the SA implementation than to
- CIP, but it does perform it's operations at slightly different stages
- of processing than the SA code.
-
- In any event, that's how I see at least these three different
- implementations. There is a subset of the IETF DNS WG that is working
- on trying to standardize on this sort of thing - including linking up
- with actual load info, such as that done by TGV and IBM
- implementations - but I needed something by the end of the month.
- Isn't that always the case :-)
-
- Hopefully, a more standardized way of handling this sort of situation
- will be developed, but until that point if anyone is interested in my
- POOL code, feel free to let me know. Give me a few days to finish
- testing though.
-
- -- David
-
- /-----------------------------------------------------------------------\
- \ David Bolen \ Internet: db3l@ans.net /
- | Advanced Network & Services, Inc. \ Phone: (914) 789-5327 |
- / 100 Clearbrook Road, Elmsford, NY 10523 \ Fax: (914) 789-5310 \
- \-----------------------------------------------------------------------/
-